-
Notifications
You must be signed in to change notification settings - Fork 264
PHPLIB-1323 Implement unlink
for GridFS stream wrapper
#1206
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
0a08f8e
to
7cbf64e
Compare
src/GridFS/WritableStream.php
Outdated
foreach ($revisions as $file) { | ||
$this->collectionWrapper->findFileById($file->get('_id')); | ||
$this->collectionWrapper->deleteFileAndChunksById($file->get('_id')); | ||
$count++; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering if we can improve on this by collecting all IDs, then deleting all files with the given filename, followed by removing all chunks for the IDs we previously found.
While there's always a chance to end up in an undesirable state, e.g. by having older revisions of files remaining, the suggestion above would reduce this by instructing the server to delete all file documents with a given name, ensuring that future requests for such files would not produce any result. Thus, if there was an error while deleting any chunks, we wouldn't end with older revisions left over.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will implement a solution consistent with rename
#1207 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would definitely benefit from some bulk strategy to remove the fs.files
documents first (likely using $in
on multiple _id
values), and then cleaning up the orphan chunks. I think it'd also benefit from being addressed in the GridFS spec, since we'd probably want to have a consistent way to address this in any driver that wishes to do so. And codifying that in a spec would ensure we're not overlooking some edge case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I implemented the bulk delete
- get all file ids by filename
- delete all files by id
- delete all chunks by file ids
The limitation is if there is more revisions that supported by a single command body.
cae12ee
to
03f4ea2
Compare
src/GridFS/CollectionWrapper.php
Outdated
$count = $this->filesCollection->deleteMany(['_id' => ['$in' => $ids]])->getDeletedCount(); | ||
$this->chunksCollection->deleteMany(['files_id' => ['$in' => $ids]]); | ||
|
||
return $count ?? 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is zero in case of WriteConcern=0 or not file to update.
src/GridFS/WritableStream.php
Outdated
foreach ($revisions as $file) { | ||
$this->collectionWrapper->findFileById($file->get('_id')); | ||
$this->collectionWrapper->deleteFileAndChunksById($file->get('_id')); | ||
$count++; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I implemented the bulk delete
- get all file ids by filename
- delete all files by id
- delete all chunks by file ids
The limitation is if there is more revisions that supported by a single command body.
docs/reference/method/MongoDBGridFSBucket-registerGlobalStreamWrapperAlias.txt
Outdated
Show resolved
Hide resolved
src/GridFS/WritableStream.php
Outdated
public function delete(): int | ||
{ | ||
try { | ||
return $this->collectionWrapper->deleteFileAndChunksByFilename($this->file['filename']); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I realize this is an internal class, so we're not actually introducing a public API here, but a delete()
method that affects multiple revisions seems odd, which is otherwise an abstraction for writing a single revision.
In the PR description, you said:
The context resolver mechanism introduced in #1138 doesn't give direct access to the CollectionWrapper which is required for this feature.
If we look at stream_open
, it accesses a CollectionWrapper after resolving the path to an array of context options. Couldn't the same approach be used here to access a CollectionWrapper directly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I felt deep down that using ReadableStream
and WritableStream
wasn't correct. I refactored by adding a private getContext
method to use CollectionWrapper
directly for rename
and unlink
.
2d9c795
to
1d8e3c4
Compare
src/GridFS/StreamWrapper.php
Outdated
@@ -324,6 +295,14 @@ public function stream_write(string $data): int | |||
return $this->stream->writeBytes($data); | |||
} | |||
|
|||
public function unlink(string $path): bool | |||
{ | |||
$context = $this->getContext($path, 'r'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bucket::resolveStreamContext()
executes findFileByFilenameAndRevision()
for read modes in order to fetch the file document. Would it make sense to use 'w'
here and avoid the unnecessary query?
I understand rename()
needs to use 'r'
since it does read the filename from the file document; however, there may be a refactoring opportunity there if you'd just like to trust the $fromPath
string. I suppose it's a matter of intentionally NOP-ing on a FileNotFoundException or executing an update query that may not match anything -- so I'd be find leaving rename()
as-is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea of using r
mode was to check that the file exists before doing operations, to throw a FileNotFoundException
. This could be optimized by checking the number of updated documents after the update, but that's not feasible if Write Concern = 0
(edge case).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I reverted to throw a FileNotFoundException
in rename
and unlink
, instead of silently returning false
. This behavior is consistent with S3 stream wrapper.
@@ -57,6 +58,9 @@ revision is read (revision ``-1``). | |||
In write mode, the stream context can contain the option ``gridfs['chunkSizeBytes']``. | |||
If omitted, the defaults are inherited from the ``Bucket`` instance option. | |||
|
|||
The functions `rename` and `unlink` will rename or remove all revisions of a | |||
filename. If the filename does not exist, these functions throw a ``FileNotFoundException``. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jmikola removing or renaming a single revision is not supported. Even if the option gridfs[revision]
is set.
$context = $this->getContext($fromPath, 'w'); | ||
|
||
$newFilename = explode('/', $toPath, 4)[3] ?? ''; | ||
$count = $context['collectionWrapper']->updateFilenameForFilename($context['filename'], $newFilename); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Edge case: if $fromPath
and $toPath
are identical, the file exists but you'll throw FileNotFoundException. That may warrant reverting to an "r" mode when selecting the context to throw eagerly.
Alternatively, you could have updateFilenameForFilename()
return the matched count and assume that the update was successful or a NOP. The only normal scenario where the matched and modified counts wouldn't be equals is if the paths were the same, in which case you could probably NOP earlier and avoid calling updateFilenameForFilename()
altogether (FWIW, PHP's rename()
returns true
in this case, but I dunno if there's some internal NOP behavior).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. I updated the code to return the matched count, and added a test for renaming with the same filename (existing and not existing).
src/GridFS/StreamWrapper.php
Outdated
@@ -338,6 +330,42 @@ public function url_stat(string $path, int $flags) | |||
return $this->stream_stat(); | |||
} | |||
|
|||
/** @psalm-return ($mode == 'r' ? array{collectionWrapper: CollectionWrapper, file: object} : array{collectionWrapper: CollectionWrapper, filename: string, options: array}) */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if this is relevant, but the ternary here only checks for "r" and stream_open()
may call this with "rb". Can you invoke functions here, or are you restricted on syntax here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logical or
works. I also added the @return
annotation so PHPStorm understands it.
* master: PHPLIB-1323 Implement `unlink` for GridFS stream wrapper (mongodb#1206) PHPLIB-1330: Sync tests for failCommand errorLabels reqs (mongodb#1214) PHPLIB-1246: Test PHP 8.3 on Evergreen (mongodb#1213) PHPLIB-1324 Implement `rename` for GridFS stream wrapper (mongodb#1207) PHPLIB-1248 Add examples on GridFS (mongodb#1196) Deprecate setting GridFS disableMD5 to false explicitly (mongodb#1205) PHPLIB-1326: Use more permissive top-level runOnRequirements (mongodb#1210) PHPLIB-1206 Add bucket alises for context resolver using GridFS StreamWrapper (mongodb#1138) Bump actions/upload-artifact from 3 to 4 (mongodb#1208) PHPLIB-1275: Replace apiargs usage in docs with extracts (mongodb#1203) Fix title formatting in Client::removeSubscriber() docs (mongodb#1204) PHPLIB-1304: Pull mongohouse image from ECR repo (mongodb#1202) Fix evergreen failures (mongodb#1200) Enable workflows to run for GitHub Merge Queue (mongodb#1199) PHPLIB-1313 Ensure the GridFS stream is saved when the script ends (mongodb#1197) PHPLIB-1309 Add addSubscriber/removeSubscriber to Client class to ease configuration (mongodb#1195) Master is now 1.18-dev
Fix PHPLIB-1323
This is not optimized as it creates a
WritableStream
instance that is not really used.The context resolver mechanism introduces in #1138 doesn't give direct access to the
CollectionWrapper
which is required for this feature.